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DETAILED ACTION 

Claims 17-52 were previously pending in the final action mailed on December 18, 

2009. 

In the amendments filed on February 2, 2010, 

Claims 27 and 42 are cancelled; 

Claims 17, 32, 47 and 50 are amended; 
Claims 17-26, 28-41 and 43-52 are now pending. 
Claims 17-26, 28-41 and 43-52 are rejected. 



Continued Examination Under 37 CFR 1.114 

1 . A second request for continued examination under 37 CFR 1.114, including the 
fee set forth in 37 CFR 1 .17(e), was filed in this application after final rejection. Since 
this application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.1 14. Applicant's submission filed on February 

2, 2010 and February 24, 2010 has been entered. 

Response to Amendments 

2. Applicant's arguments and amendments filed on February 2, 2010 have been 
carefully considered. After reviewing the reference previously relied on (i.e. Russell, 
U.S. 5,526,407), Examiner determined that Russell teaches the limitations newly merged 
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into the independent claims from the dependent claims 27 or 42. Therefore, the 
amendments failed to place the claims in condition for allowance. 

Applicant's arguments regarding the amendments to the independent claims are 
that Russell did not teach the amended limitations: 

"wherein each information data block contains an information data block 
identifier and audio information data, and signal pause data block contains a signal pause 
data block identifier and signal pause duration data." 

In response, Examiner points to Russell, col. 18, lines 2-13, where Russell 
disclosed a data element that comprises a Phrase ID and Phrase attribute to uniquely 
identify a phrase in speech. Examiner considers the data element to anticipate 
"information data block" and the Phrase ID to anticipate the "information data block 
identifier." Furthermore, implicit in Russell's teaching is that the data element can also 
be used to identify pauses in speech in the same was it is used to identify the phrase. 

Therefore, Examiner considers Russell to teach the newly added claim limitations. 

Claim Rejections - 35 USC §103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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3. Claims 17-18, 20, 26, 32-33, 35, 41, 46 and 50 are rejected under 35 
U.S.C. 103(a) as being anticipated by Russell et al. (U.S. patent No. 5,526,407, 
hereinafter "Russell"), in view of Yamamoto et al. (U.S. patent No.4,355,338, hereinafter 
"Yamamoto"). 

Regarding claim 17, Russell disclosed a method for digitally recording an analog 
audio signal, the method comprising: 

(a) receiving an analog audio signal (Russell, Fig. 2 and col. 9, line 30 disclosed 
"analog front end", which records analog signal) containing audio information and signal 
pauses information (a speech signal by nature contains speech information (i.e., audio) 
and non-speech information (i.e. signal pauses), as evident in Russell, col. 7, lines 4-5); 

(b) converting the analog audio signal into digital audio signal comprising audio 
information data and signal pause duration data (Russell, Fig. 2 and col. 9, lines 31-33); 

(c) storing the audio information data of the digital audio signal as information 
data blocks and the signal pause duration data of the digital audio signal as signal pause 
data blocks having different time durations in a memory (Russell, col. 6, lines 43-53 
disclosed storing the speech stream and structure which represents categorized portions 
of the speech stream; Russell, col. 15 lines 8-11 further disclosed that the characterization 
of speech includes the phrase time duration and the presence of pauses), 

wherein each information data block contains an information data block identifier 
and audio information data, and signal pause data block contains a signal pause data 
block identifier and signal pause duration data (Russell, col. 18, lines 2-13, "a data 
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element" and "Phrase ID" or col. 18, lines 64-66, "phrase descriptor structure" and 
"phrase ID"); 

and the audio information data and the signal pause duration data of the resulted 
digital audio signal represent outputs at a normal speaking speed (Russell, col. 7, lines 4- 
7 and col. 7, lines 13-15. In particular, col. 7, lines 13-15 disclosed "When playing the 
recalled speech, the present invention may optionally skip the identified speech pauses 
and non-speech utterances," which implies that the pauses and non-speech utterances 
represent output at a normal speaking speed); 

(d) generating a plurality of audio information data sequences by sequentially 
reading the information data blocks and the signal pause data blocks (Russell, col. 14, 
lines 43-50 disclosed that a speech process program 136 separate the speech into phrases 
demarcated by perceptible pauses; col. 17, line 67 disclosed "RIFF chunk" as an audio 
information data sequence), the audio information data sequences being separated by the 
signal pause data blocks if an assigned time duration of the signal pause data block is 
higher than a predetermined time duration (col. 17, line 64 disclosed that the silence of a 
specified second duration marks a signal pause). 

Russell did not explicitly disclose receiving an analog audio signal played at an 
increased speed. 

However, Yamamoto disclosed a method for fast reproduction of music or 
language tapes, where it is know in the prior art that to fast, mass reproduce music or 
language tapes, the source tape can be driven at a high speed (such as 32 times the normal 
speed) (see Yamamoto, col. 1, lines 19-22 and lines 38-45). 
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One of ordinary skill in the art would have been motivated to combine Russell 
and Yamamoto because both disclosed system and method for converting an analog 
signal to a digital signal using analog-to-digital converter (Russell, Fig. 2, 'Analog Front- 
end"; Yamamoto, Fig. 2, 'A/D 9"). 

Therefore, It would have been obvious for one skilled in the art to apply 
Yamamoto's teaching to Russell such that pre-recorded audio signal is loaded into the 
processing device 43 in Fig. 3 at reduced time, allowing the speech processing modules 
(Fig. 4, "voice print info", "speech extraction", "codec" codec) sufficient time to process 
the speech and produce output without introducing a noticeable delay to the user. The 
combination yields the highly desirable result of providing better user experience. 

Regarding claim 18, the combination of Russell and Yamamoto disclosed the 
method of claim 17. 

Russell further disclosed producing an index table by sequentially reading the 
information data blocks and the signal pause data blocks (Russell, col. 14, lines 43-67 
and col. 11, lines 1-17). 

Regarding claim 20, the combination of Russell and Yamamoto disclosed the 
method of claim 18. 

Russell further disclosed wherein producing the index table comprises processing 
the sequentially read data blocks (col. 14, lines 43-67 and col. 11, lines 1-17). 
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Regarding claim 32, Russell disclosed a method for digitally recording an analog 
audio signal with automatic indexing, the method comprising: 

(a) receiving an analog audio signal containing audio information and signal (Fig. 
2 and col. 9, line 30 disclosed "analog front end", which records analog signal; a speech 
signal by nature contains speech information (i.e., audio) and non-speech information 
(i.e. signal pauses), as evident in the disclosure in col. 7, lines 4-5); 

(b) converting the analog audio signal into digital audio data comprising audio 
information data and signal pause duration data (Fig. 2 and column 9, lines 31-33); 

(c) storing the converted digital audio data (col. 6, lines 44-45, "stores the speech 
stream in at least a temporary storage"); 

(d) reading the stored digital audio data sequentially (col. 14, lines 48-49 
disclosed that the speech process program 136 allocates buffers to receive the real-time 
speech and separate the speech into phrases, which implies that the speech process 
program 136 must read the real-time speech sequentially); 

(e) decoding whether the digital audio data are audio information data or signal 
pause duration data (col. 15, lines 46-53 and col. 17, lines 63-65); 

(f) storing the audio information data as information data blocks and the signal 
pause duration data as signal pause data blocks in a memory (column 6, lines 43-53 
disclosed storing the speech stream and structure which represents categorized portions 
of the speech stream; col. 15 lines 8-1 1 further disclosed that the characterization of 
speech includes the phrase time duration and the presence of pauses), 

wherein each information data block contains an information data block identifier 
and audio information data, and signal pause data block contains a signal pause data 
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block identifier and signal pause duration data (Russell, col. 18, lines 2-13, "a data 
element" and "Phrase ID" or col. 18, lines 64-66, "phrase descriptor structure" and 
"phrase ID"); and 

(g) reading the stored data blocks sequentially in order to produce a data structure 
for managing the indexing (col. 14, lines 43-50 disclosed that a speech process program 
136 separate the speech into phrases demarcated by perceptible pauses; col. 17, line 67 
disclosed "RIFF chunk" as an audio information data sequence), 

wherein a succession of information data blocks which is not interrupted by a 
signal pause with a pre-determined duration being detected as an audio information data 
sequence whose start and end are stored in the data structure for managing the indexing 
(col. 17, lines 63-64 disclosed that sound of at least a certain first threshold duration 
followed by silence of a specified second duration indicates that a phrase has been 
completed; and the time the phrases began and ended are recorded). 

Russell did not explicitly disclose receiving the analog audio signal played at an 
increased speed. 

However, Yamamoto disclosed a method for fast reproduction of music or 
language tapes, where it is know in the prior art that to fast, mass reproduce music or 
language tapes, the source tape can be driven at a high speed (such as 32 times the normal 
speed) (see Yamamoto, col. 1, lines 19-22 and lines 38-45). 

One of ordinary skill in the art would have been motivated to combine Russell 
and Yamamoto because both disclosed system and method for converting an analog 



Application/Control Number: 1 0/03 1 ,47 1 Page 9 

Art Unit: 2444 

signal to a digital signal using analog-to-digital converter (Russell, Fig. 2, 'Analog Front- 
end"; Yamamoto, Fig. 2, "A/D 9"). 

Therefore, It would have been obvious for one skilled in the art to apply 
Yamamoto's teaching to Russell such that pre-recorded audio signal is loaded into the 
processing device 43 in Fig. 3 at reduced time, allowing the speech processing modules 
(Fig. 4, "voice print info", "speech extraction", "codec" codec) sufficient time to process 
the speech and produce output without introducing a noticeable delay to the user. The 
combination yields the highly desirable result of providing better user experience. 

Claim 47 is rejected under the same rationale as claim 32 as it lists elements that 
are all listed in claim 32 and disclosed by Russell. 

Claim 50 is rejected under the same rationale as claim 32 as it lists elements that 
are all listed in claim 32 and disclosed by Russell. 

Regarding claims 26 and 41, the combination of Russell and Yamamoto 
disclosed the method of claims 17 and 32. 

Russell further disclosed wherein the digital audio data are compressed before 
storage (col. 9, lines 39-43, "codec" and col. 14, lines 66, "to compress the speech"). 

Regarding claim 33, the combination of Russell and Yamamoto disclosed the 
method of claim 32. 
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Russell further disclosed wherein the data structure produced for managing the 
indexing is an index table (Fig. 5 and col. 11, lines 1-27). 

Regarding claim 35, the combination of Russell and Yamamoto disclosed the 
method of claim 33. 

Russell further disclosed processing and producing the index table while 
sequentially reading the data blocks (col. 14, lines 43-67 and col. 11, lines 1-17). 

4. Claims 19 and 34 are rejected under 35 U.S.C. 103(a) as obvious over Russell 
and Yamamoto, in view of Welch et al. (U.S. Patent No. 4,336,421, hereinafter 
"Welch"). 

Regarding claims 19 and 34, the combination of Russell and Yamamoto 
disclosed the method of claims 18 and 33. 

Russell further disclosed the start and end of an audio information data sequence 
are stored as start and end address (Russell, Fig. 5 and column 1 8). 

Russell did not explicitly disclose that the start address and end address are for a 
first address pointer and a second address pointer of the index table. The examiner 
interprets the first address pointer and the second address pointer as the pointer to the 
start and end of an audio information data in the memory, as it is unclear to the examiner 
what a first address pointer and a second address pointer of the index table entails. 
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However, in the same field of endeavor, Welch disclosed storing the start and end 
of a speech segment as the start and end addresses in the memory (Welch, col. 13, lines 
42-64). 

One of ordinary skill in the art would have been motivated to combine Russell 
and Welch because both disclosed detecting voice sounds and pauses in speech (Russell, 
"Summary of the Invention" and Welch, "Summary of the Invention"), and Welch 
supplemented Russell's teaching with implementation details relating to buffer 
management for the speech data. 

Therefore, it would have been obvious for one to combine Russell and Welch 
such that Russell's invention can be embodied using techniques taught by Welch. 

5. Claims 21-23, 30, 36-38, 45, 48 and 51 are rejected under 35 U.S.C. 103(a) as 
obvious over Russell and Yamamoto, in view of Freudberg et al. (U.S. Patent No. 
4,696,031, hereinafter "Freudberg"). 

Regarding claims 21, 36, 48 and 51, the combination of Russell and Yamamoto 
disclosed the method of claims 20, 35, 47 and 50. 

Russell further disclosed filtering out short spoken utterances that are not useful 
for the user (Russell, col. 16, lines 21-46). 

Russell did not explicitly disclose filtering a particular minimum value for the 
number of information blocks doe not exceed and a particular first time limit value the 
signal pause of the two adjacent signal pause data blocks exceeds. 

However, Freudberg disclosed filtering out short bursts of energy by combining 
an ON time of less than 200 msec with the OFF times of the two adjacent OFF intervals 
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to form a single OFF interval, which is essentially the same as what's described in the 
claim (column 6, lines 32-49). 

One would have been motivated to combine Russell and Freudberg because both 
disclosed silence (i.e. pause) detection and speech signal segmentation (Russell, col. 17, 
lines 60-65; Freudberg, col. 3, lines 28-40, "signal-ON" and "signal-OFF"). 

Therefore, it would have been obvious for one to incorporate Freudberg 's method 
of filtering out short energy bursts using threshold values into Russell to filter out 
information block that are falsely detected as speech blocks so as to save system 
processing time and reduce error rate. 

Regarding claims 22 and 37, the combination of Russell, Yamato and Freudberg 
disclosed the method of claims 21 and 36. 

Russell did not explicitly disclose wherein the minimum value is 1 . 

Freudberg disclosed the minimum value for ON time is 200 msec (Freudberg, col. 
6, lines 32-49). 

Examiner considers the difference in the minimum value as an implementation 
choice that may vary in different embodiments of the same inventive idea. 

The rationale for the motivation to combine Russell and Freudberg is the same as 
that provided in the rejection of claim 2 1 . 

Regarding claims 23 and 38, the combination of Russell, Yamamoto and 
Freudberg disclosed the method of claims 21 and 36. 

Russell did not explicitly disclose wherein the first time limit value is 0.5 seconds. 
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Freudberg disclosed using an OFF interval (Freudberg, col. 6, lines 50-55). 

Examiner considers the value of 0.5 seconds specified in the claim as an 
implementation choice of Freudberg's OFF interval, which may vary in different 
embodiments of the same inventive idea. 

The rationale for the motivation to combine Russell and Freudberg is the same as 
that provided in the rejection of claim 2 1 . 

Regarding claims 30 and 45, the combination of Russell and Yamamoto 
disclosed the method of claims 17 and 32. 

Russell did not explicitly disclose wherein a succession of information data 
blocks which is not separated by a signal pause data block whose signal pause duration 
data amount to a signal pause of more than 2 seconds is detected as an audio information 
data sequence. 

However, Freudberg disclosed a signal pause detection method using an ON time 
to determine the minimum duration of speech intervals (Freudberg, col. 6, lines 32-49). 

Examiner considers the time duration of 2 seconds recited in the present claim as 
an implementation choice Freudberg's ON time, which may vary in different 
embodiments of the same inventive concept. 

6. Claims 24-25, 31, 39-40, 46, 49 and 52 are rejected under 35 U.S.C. 103(a) as 
obvious over Russell and Yamamoto, in view of Imai et al. (U.S. 2001/0010037, 
hereinafter "Imai"). 
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Regarding claims 24, 39, 49 and 52, the combination of Russell and Yamamoto 
disclosed the method of claims 20, 35, 48 and the apparatus of claim 50. 

Russell did not explicitly disclose while processing the data, overwriting the 
signal duration data of signal pause data blocks whose signal pause duration exceeds a 
particular second time limit value with signal duration data having particular nominal 
signal duration. 

However, Imai (2001/0010037) disclosed a speech rate conversion system that 
replaces a non-speech interval exceeding a constant continued time with a break of the 
constant continued time that is shorter than the actual non-speech interval (Imai, [0034]). 

One would have been motivated to combine Russell and Imai because both 
disclosed non-speech interval detection. 

Therefore, it would have been obvious for one to add Imai's speech rate 
conversion to Russell to save storage space and adjust playback speed without losing 
audio information, 

Regarding claims 25 and 40, the combination of Russell, Yamamoto and Imai 
disclosed the method of claims 24 and 39. 

Imai further disclosed wherein the second time limit value is 10 seconds and the 
nominal signal duration is 2 seconds (Examiner considers the threshold values recited in 
the claim as an implementation choice of Imai's constant continued time disclosed in 
[0034] which may vary in different embodiments of the same inventive idea). 

The rationale for the motivation to combine Russell and Imai is the same as that 
provided in the rejection of claim 24. 
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Regarding claims 31 and 46, the combination of Russell and Yamamoto 
disclosed the method of claims 17 and 32. 

Russell did not explicitly disclose wherein, when receiving the analog audio 
signal, the playing speed of a data medium on which the analog audio signal is recorded 
can be set. 

However, Imai disclosed a method for converting speech rate at a preset scaling 
factor using speech interval detection (Imai, "Abstract"). 

The rationale for the motivation to combine Russell and Imai is the same as that 
provided in the rejection of claim 24. 

7. Claims 28-29 and 43-44 are rejected under 35 U.S.C. 103(a) as obvious over 
Russell and Yamamoto, in view of Gan et al. (IEEE Publication "Implementation of 
Silence Compression Scheme For G.723.1 Speech Coder Using TI TMS320S75 DSP 
Chip", 1997, hereinafter "Gan"). 

Regarding claims 28 and 43, the combination of Russell and Yamamoto 
disclosed the method of claims 17 and 32. 

Russell did not explicitly disclose wherein all the data blocks are of the same size 
and correspond to a particular basic unit of duration. 

However, Gan disclosed an implementation of silence compression for G.723.1 
coder, wherein G.723. 1 coder has frame size of 30 ms and compressed frame size of 24 
bytes at 6.3 kb/s coding rate. 
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One would have been motivated to combine Russell and Gan because both 
disclosed silence detection and codec. 

It would have been obvious for one to incorporate Gan's silence compression and 
G. 723.1 into Russell as one of the possible embodiments of Russell's speech peripheral. 

Regarding claims 29 and 44, the combination of Russell, Yamamoto and Gan 
disclosed the method of claims 28 and 43. 

Russell did not explicitly disclose wherein the basic unit of duration is 30 ms. 

However, as already addressed above in the rejection of claim 28, Gan disclosed a 
silence compression implementation for G.723. 1 codec, where the G.723. 1 . codec has 
basic frame duration of 30 ms. 

The rationale for the motivation to combine Russell and Gan is the same as that 
provided in the rejection of claim 28. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to SHIRLEY X. ZHANG whose telephone number is 
(571)270-5012. The examiner can normally be reached on Monday through Friday 
7:30am -5:00pm EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, William Vaughn can be reached on (571) 272-3922. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through Private PAIR only. For 
more information about the PAIR system, see http://pair-direct.uspto.gov. Should you 
have questions on access to the Private PAIR system, contact the Electronic Business 
Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 
Customer Service Representative or access to the automated information system, call 
800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/S.X. Z.I Art Unit 2444 
4/13/2010 



/William C. Vaughn, Jr./ 

Supervisory Patent Examiner, Art Unit 2444 



